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Abstract —Infinite population models are important tools for 
studying population dynamics of evolutionary algorithms. They 
describe how the distributions of populations change between 
consecutive generations. In general, infinite population models 
are derived from Markov chains by exploiting symmetries be¬ 
tween individuals in the population and analyzing the limit as the 
population size goes to Infinity. In this paper, we study the theo¬ 
retical foundations of infinite population models of evolutionary 
algorithms on continuous optimization problems. First, we show 
that the convergence proofs in a widely cited study were in fact 
problematic and Incomplete. We further show that the modeling 
assumption of exchangeability of individuals cannot yield the 
transition equation. Then, in order to analyze infinite population 
models, we build an analytical framework based on convergence 
in distribution of random elements which take values in the 
metric space of infinite sequences. The framework is concise and 
mathematically rigorous. It also provides an infrastructure for 
studying the convergence of the stacking of operators and of 
iterating the algorithm which previous studies failed to address. 
Finally, we use the framework to prove the convergence of 
infinite population models for the mutation operator and the 
fc-ary recombination operator. We show that these operators 
can provide accurate predictions for real population dynamics 
as the population size goes to infinity, provided that the initial 
population is identically and independently distributed. 

Index Terms —Evolutionary algorithms. Infinite population 
models, population dynamics, convergence in distribution, the¬ 
oretical analysis 

I. Introduction 

E volutionary algorithms (EAs) are general purpose 
optimization algorithms which saw great successes in 
real-world applications. They are inspired by the evolutionary 
process in nature. A certain number of candidate solutions to 
the problem at hand are modeled as individuals in a population, 
and through generations the algorithm evolves the population 
by producing new individuals and selectively replacing the old 
ones. The idea is that the survival probabilities of individuals in 
the population are related to their objective function values, or 
fitness values in this context. In general, individuals with more 
preferable objective function values or higher fitness values are 
more likely to survive and remain in the next generation. As a 
result, by the “survival of the fittest” principle, it is likely that 
after many generations the population will contain individuals 
with sufficiently high fitness values, such that these individuals 
are satisfactory solutions to the problem at hand. 

Bo Song and Victor O.K. Li are with the Department of Electrical and 
Electronic Engineering, the University of Hong Kong, Pokfulam, Hong Kong 
(e-mail: bsong@connect.hku.hk; vli@eee.hku.hk). 


Though conceptually simple, the underlying evolutionary 
processes and the behaviors of EAs remain to be fully 
understood. The difficulties lie in the fact that EAs are 
customizable population-based iterative stochastic algorithms, 
and the objective function also has great influence on their 
behaviors. A successful model of EAs should account for 
both the mechanisms of the algorithm and the influence from 
the objective function. One way to derive such models is 
to study EAs as dynamical systems. The idea is to pick a 
certain quantity of interest first, such as the distribution of 
the population or a certain statistic about it. Then, transitions 
in the state space of all possible outcomes about the picked 
quantity are studied. A Markov chain described by a transition 
matrix (when the state space is finite) or a difference equation 
(when the state space is not finite) is derived to describe how 
the picked quantity changes between consecutive generations. 

Although dynamical system approach brings many insights 
about EAs, the state spaces of the models tend to grow rapidly 
as the population size increases. This is because in order 
to characterize the population dynamics accurately, the state 
space in the model has to be large enough to describe all the 
interdependencies between individuals in the current and next 
generations. As a result, even for time-homogeneous EAs with 
moderate population size, the dynamical system model is often 
too large and too complex to be analyzed or simulated. To 
overcome this issue, some researchers instead turn to studying 
the limiting behaviors of EAs as the population size goes to 
infinity. The idea is to exploit some kind of symmetry in the 
state space (such as all individuals have the same marginal 
distribution), and prove that in the limit the Markov chain can 
be described by a more compact model. The models built in 
this way are called infinite population models (IPMs). 

In this paper, we follow this line of research and study IPMs 
of EAs on continuous space. More specifically, we aim at 
rigorously proving the convergence of IPMs. Notice that in this 
study by convergence we usually mean a certain property of 
IPMs. That an IPM converges loosely means that as the popu¬ 
lation size goes to infinity, the population dynamics of the real 
EA converge in a sense to the population dynamics predicted 
by this model. This usage is different from conventional ones 
where it means that the EA eventually locates and gets stuck 
in some local or global optima. Convergence results guarantee 
that IPMs characterize some kind of limiting behaviors of real 
EAs. They are the foundations and justifications of IPMs. 

To our knowledge, there are very few research efforts which 
directly studied the convergence of IPMs. Among them, the 
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studies of Qi et al. in m, m are the classic and most 
relevant ones. Qi et al. studied the population dynamics of 
simple EA on continuous space. In the hrst part of their 
research HI, the authors built an IPM to analyze the population 
dynamics of simple EA with proportionate selection and 
mutation. Traditionally, a transition equation is constructed to 
describe how the probability density functions (p.d.f.s) of the 
joint distributions of individuals change between consecutive 
generations. The novelty of the authors’ research lies in their 
introduction of the modeling assumption that individuals in 
the same generation are exchangeable, and therefore they all 
have the same marginal distribution. Then, as a key result, 
the authors proved that as the population size goes to inhnity, 
the marginal p.d.f.s of the populations produced by the real 
algorithm converge point-wisely to the p.d.f.s. predicted by the 
following transition equation: 

, . , Jrhkiy) 9 iy)Uix\y)dy 

/x.+iW = r f t \ t 

hf^^yjgiyHy 

where F is the solution space, /x^, is the predicted marginal 
p.d.f. of the feth generation, g is the objective function to 
be maximized and fwix\y) is the conditional p.d.f. decided 
by the mutation operator. Though the transition equation of 
marginal distributions loses information of interdependency 
between individuals, it has simpler form and can still provide 
a relatively complete description of the population. Moreover, 
as proved in HI, it is accurate in the limiting case when the 
population size goes to infinity. Eurthermore, in the second 
part of the research El, the authors analyzed the crossover 
operator and modified the transition equation to include all 
three operators in the simple EA. Overall, the studies of 
Qi et al. are inspiring, especially the idea of combining the 
modeling assumption that individuals are exchangeable with 
the mathematical analysis of point-wise convergence of p.d.f.s 
as the population size goes to infinity. 

However, as will be shown in Section |II] the convergence 
proof for HI in HI is problematic. We provide a counterex¬ 
ample to show that in the authors’ proof a key assertion 
about the law of large numbers (LLN) for exchangeable 
random vectors is generally not true. Therefore the whole 
proof is unsound. Eurthermore, we show that the modeling 
assumption of exchangeability of individuals can not yield 
the transition equation in general. This means that under the 
authors’ modeling assumption, the conclusion HJ cannot be 
reached. 

In addition to the aforementioned problems, we also show 
that the authors’ proofs in both III and El are incomplete. 
The authors did not address the convergence of the stacking of 
operators and of recursively iterating the algorithm. In essence, 
the authors only attempted to prove the convergence of the 
IPM for one iteration step. Even if the proof for HI is correct, 
it only shows that the marginal p.d.f. of the (fc-l-l)th population 
produced by the real algorithm converges point-wisely to 
/xfc+i {x) calculated by HI, provided that the marginal p.d.f. of 
the /cth generation is /xj. {x) and assuming that the population 
size goes to infinity. However, this convergence does not 
automatically hold for all subsequent generations. In fact, it 
rarely holds because fx^+i {x) is only accurate in the limit. 


Compared with finite-sized populations produced by the real 
algorithm, it inevitably encompasses errors. As a result, HI 
cannot be iterated to make predictions for subsequent (> fc-fl) 
generations. 

Besides m, El, we found no other studies which attempted 
to prove the convergence of IPMs for EAs on continuous 
space. Therefore, to fill the research gap, in Section |III] we 
propose a general analytical framework. The novelty of our 
framework is that from the very start of the analysis, we model 
generations of the population as random elements taking 
values in the metric space of infinite sequences, and we use 
convergence in distribution instead of point-wise convergence 
to define the convergence of IPMs. 

To understand the issues and appreciate our framework, 
consider an EA operating in on a hxed continuous op¬ 
timization problem with different population sizes. When the 
population size is n, denote the algorithm by EA„. The fcth 
generation produced by EA„ can be described by the joint 
distribution of n random vectors of R'^, with each random 
vector representing an individual. Denote the random element 
modeling the fcth generation by Pj! = xj! 2, ■ • ■, xj! ^). 

Similarly, the same EA with population size (n-fl) is denoted 
by EA„+i, and the feth generation it produces is modeled by 

Einally, denote the IPM 
for this EA by EAqo, and the generations it produces by 
(P“)fc=o,i,...- Notice that each P“ is a random sequence. 
Essentially, the convergence of IPMs requires that EAqo 
predicts every generation produced by EA„ as n 00. 
Mathematically, this corresponds to the requirement that for 
each generation fc, the sequence P^, P^,... converges to P“ 
in some sense as n — 00. 

However, it is not obvious how one can rigorously define the 
convergence for the sequence (Pfc)rt=i,2,...- This is because 
P^,n = 1 , 2 ,... and the limit P“ are all random elements 
taking values in different metric spaces. The range of P^ is 
the Cartesian product of n copies of whereas the range of 
P“ is the inhnite product space R'^ X X ... . To overcome 
this issue, Qi et al. essentially dehned the convergence of 

IPMs as P;, -^ P;. , where -stands for point-wise 

convergence of marginal p.d.f.s. However, as mentioned, we 
believe their proofs are problematic and incomplete. 

In this research, we took a different approach. We extended 
Pj!, unihed the ranges of random elements in a common 
metric space and gave a mathematically rigorous definition 
of sequence convergence. We assume for each generation 
fe, EA„ hrst generates an intermediate inhnite sequence of 
individuals = (y^ i,yfe 2; • ■ •) based on the previous 
generation Pfe_i. Here is a random sequence whose 
elements are conditionally independent and identically dis¬ 
tributed (c.i.i.d.) given Pfe_i. Then, EA„ preserves the hrst 
n elements of to form the new generation P^, i.e. 
Pfe = (yfe.i>yfc,2>--->yfe.n)- Basically the modihed EA„ 
progresses in the order of ..., Q^, P^, Q^+i, P^+i,.... Eor 
EAoo, because P“ is already a random sequence, we just 
let Q“ = P“. Then, we dehne that EAoo is convergent if 
and only if for every fc, as n — > cxd, where 

represents convergence in distribution, or equivalently weak 
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convergence. Our design has several advantages. Firstly, for 
every population size n, the sequence P^, k = 1,2,... coin¬ 
cides exactly with the population dynamics produced by EA„ 
without the intermediate sequence Q^, fc = 1, 2,.... In other 
words, our model is a faithful model and the intermediate step 
does not change the population dynamics. Secondly, the ranges 
of Qfe,n = 1,2,... and Q“ are unified in the same metric 
space. Therefore we can rigorously define the convergence of 
IPMs. Finally, in our proposed framework, the convergence of 
the stacking of operators and of iterating the algorithm can be 
proved. All these benefits come from the interplay between 
the finite-dimensional population dynamics PJ! and its infinite 
dimensional extensions Q^. The only modeling assumption 
in our framework is that new individuals are generated c.i.i.d. 
given the current generation. This is a reasonable assumption 
because exchangeability and c.i.i.d. are equivalent given the 
current population. We will present the framework and related 
topics in Section Hn] 

To illustrate the effectiveness of our framework, we perform 
convergence analysis of IPM of the simple EA. As our 
analyses show, the modeling assumption of exchangeability 
cannot yield the transition equation. Therefore, to obtain mean¬ 
ingful results, we adopt a “stronger” modeling assumption that 
individuals of the same generation in the IPM are identically 
and independently distributed (i.i.d.). This assumption seems 
restricted at first sight, but it turns out to be a reasonable one. 
We analyze the mutation operator and the fc-ary recombination 
operator. We show that these commonly used operators have 
the property of producing i.i.d. populations, in the sense that 
if the initial population is i.i.d., as the population size goes to 
infinity, in the limit all subsequent generations are also i.i.d.. 
This means that for these operators, the transition equation 
in the IPM can predict the real population dynamics as the 
population size goes to infinity. We also show that our results 
hold even if these operators are stacked together and iterated 
repeatedly by the algorithm. These results are presented in 
Section |IV] Einally, in Section |V] we conclude the paper and 
propose future research. 

To be complete, regarding ID, 0, there is a comment from 
Yong El with reply. However, the comment was mainly about 
the latter part of 0, where the authors analyzed the properties 
of EAs based on the IPM. It did not discuss the proof for 
the model itself. Eor IPMs of EAs on discrete optimization 
problems, extensive research were done by Vose et al. in a 
series of studies The problems under consideration 

were discrete optimization problems with finite solution space. 
The staring point of the authors’ analysis was to model each 
generation of the population as an “incidence vector”, which 
describes for each point in the solution space the proportion 
of the population it occupies. Based on this representation the 
authors derived transition equations between incidence vectors 
of consecutive generations and analyzed their properties as 
the population size goes to infinity. However, for EAs on 
continuous solution space, the analyses of Vose et al. are 
not immediately applicable. This is because for continuous 
optimization problems the solution space is not denumerable. 
Therefore, the population cannot be described by a finite¬ 
dimensional incidence vector. 


II. Discussion of the Works of Qi et al. 

In this section we analyze the results of Qi et al. in 0 , 0 . 
We begin by introducing some preliminaries for the analysis. 
Then, in Section III-BI following the notations and derivations 
in the authors’ papers, we provide a counterexample to show 
that the convergence proof for the transition equation in 0 is 
problematic. We further show that the modeling assumption of 
exchangeability cannot yield the transition equation in general. 
In Section III-CI we show that the analyses in both 0 and 0 
are incomplete. The authors did not prove the convergence of 
IPMs in the cases where operators are stacked together and 
the algorithm is iterated for multiple generations. 

A. Preliminaries 

In the authors’ paper 0 , the problem to be optimized is 
argmaxp(a:) s.t. x e F C R™, (2) 

X 

where F is the solution space and g is some given objective 
function. The analysis intends to be general; therefore no 
explicit form of g is assumed. The algorithm to be analyzed 
is the simple EA with proportionate selection and mutation. 
Let Xfc = denote the fcth generation produced by the 

EA, where N is the population size. To generate the {k + l)th 
population, an intermediate population Xj. = is 

firstly generated based on X^ by the proportionate selection 
operator. The elements in X^ are c.i.i.d given X^. The 
distribution of Xj, follows the conditional probability that 

P(x;* = xiIXfe) = for all j = 1, 2,..., TV. 

Ei=i9K) 

( 3 ) 

After selection, each individual in X'^ is mutated to generate 
individuals in X^+i. The mutation is conducted following the 
conditional p.d.f. 

/(x*fe+i = x|x'j! = y)= fw{x\y). (4) 

Overall the algorithm is illustrated in Pig. 0 

After presenting the optimization problem and the algo¬ 
rithm, the authors proved the convergence of the IPM. It is 
the main result in 0. It can be reiterated as follows. 

Theorem 1 (Theorem 1 in Qi et al. 0). Assume that the 
fitness function g{x) in 0 and the mutation operator of simple 
EA described by 0 satisfy the following conditions: 

1) 0 < g^in < g{x) < ffmax < oo.Vx £ F. 

2) sup fw{x\y) <M<oo. 

Then as n ^ oo, the time history of the simple EA can 
be described by a sequence of random vectors (xfc)^Q with 
densities 

, . , Svhk{y)9{v)fw{x\y)<ly 

hk+A^) = r f ! : ( ' (5) 

JF/xic(2/)5(j/)dy 

In Theorem 0 /x^, is the marginal p.d.f. of the kth genera¬ 
tion predicted by the IPM. 

As the proof for Theorem 0 in 0 and the analyses in this 
paper use the concept of exchangeability in probability theory, 
we list its definition and some basic facts. 
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Require: population size A^; p.d.f. of the initial population /xq 
1: fc^O 

2: sample the N i.i.d. individuals xj.,x|,...,according 
to /xo 

3: while stopping criteria is not satisfied do 
4: select Xj3, xj?,..., xjf^ from xj., x|,..., x^ identi¬ 

cally and independently according to the probability that 


5: 


P(xfc = x-^IXfe) 




Vi,i 


1 , 2 , 




0 Selection 

perturb x^^ ^ xj?,..., xjf^ to form the new generation 
xj,^]^, x|^j,..., x^j according to the common condi¬ 
tional p.d.f. 


/(xfc+i = a;|xfc+i =y) = fn,ix\y),\/i = 1,2, ...,N 


t> Mutation 

6i k ^— k 1 

7: end while 


Fig. 1. The pseudocode of the simple EA 


Definition 1 (Exchangeable random variables, Definition 1.1.1 
in IHI). A finite set of random variables is said to be 

exchangeable if the joint distribution of is invariant 

with respect to permutations of the indices 1,2,... ,n. A 
collection of random variables {xq, : a G E} /s said to 
be exchangeable if every finite subset of {xq, : a € E} /s 
exchangeable. 

Definition [T] can also be extended to cover exchangeable 
random vectors or exchangeable random elements by replac¬ 
ing the term “random variables” in the definition with the 
respective term. One property of exchangeability is that if 
{xi}r=i exchangeable random elements, then the joint 

distributions of any 1 < k < n distinct ones of them are 
always the same (Proposition 1.1.1 in il). When k = 1 
this property indicates that {xi}"^^ have the same marginal 
distribution. Another property is that a collection of random 
elements are exchangeable if and only if they are c.i.i.d. given 
some cr-field Q (Theorem 1.2.2 in ISl). Conversely, a collection 
of c.i.i.d. random elements are always exchangeable. Finally, 
an obvious fact is that i.i.d. random elements are exchangeable, 
but the converse is not necessarily true. 

It can be seen that the simple EA generates c.i.i.d. individ¬ 
uals given the current population. Therefore, the individuals 
within the same generation are exchangeable, and they have 
the same marginal distribution. This leads to the transition 
equation of marginal p.d.f.s in Theorem[T] To analyze its proof 
and construct our framework, we will also use the definition 
and properties of exchangeability. 

B. Convergence Proof of the Transition Equation 

In this section we analyze the proof of Theorem[T]and show 
that it is incorrect. The proof by Qi et al. is in Appendix A 
of lH. In the proof the authors assumed that individuals in 
the same generation are exchangeable, therefore they have the 
same marginal distribution. After a series of derivation steps. 


the authors managed to obtain a transition equation between 
the density functions of and X^: 




g(.yj)fw{x\yj) 


N 


"" iEsiyi) 


/x,(t/l,2/2,...,2/n) 


1=1 


= E 


dyidy 2 •. • dy„ for any fixed i, j 


L Bk J 


(6) 

( 7 ) 


where in O, 


^kix) = g{x.l)U{x\^i) for any fixed j, ( 8 ) 

1 ^ 

Vk = ( 9 ) 

(01 and (0 are exact. They accurately describe how the 
marginal p.d.f. for any individual in the next generation 
can be calculated from the joint p.d.f. of individuals in the 
current generation. Noticing that rj^ is the average of the 
exchangeable random variables {y(x^)}|^]^, by the LLN for 
exchangeable random variables, the authors asserted that 

lim = rju almost surely (a.s.). ( 10 ) 

N=k>c 

The authors further asserted that is itself a random variable, 
satisfying 

HVk\ = Eb(x^fc)] for any 7- (If) 


([Tol l and (fTTI) correspond to (A13) and (A14) in Appendix 
A of mi, respectively. The authors’ proof is correct until this 
step. However, the authors then asserted that 

rjf. is independent of for any finite N. In par¬ 
ticular, is independent of 77 ^ = g(xl) for all 
J = 1,2,...,N. 


( 12 ) 

Based on this assertion the authors then proved that for all 
k and x, 


lim 

N^O 


E 

'^kixY 

E[^fc(x)] 


[ Vk \ 

E[r 7 fc] 


= 0 . 


( 13 ) 


Therefore, the p.d.f. in 0 converges point-wisely to 

Noticing that the expression of is equal to the right 

hand side of 0, the authors claimed tfiat Theorem0is proved. 

In the following, we provide a counterexample to show that 
assertion (fT^ is not true. Then, we carry out further analysis to 
show that under the modeling assumption of exchangeability, 
the conclusion in ( fTH ), or equivalently Theorem [ 1 ] cannot be 
true in general. 

1) On Assertion ( 1721) .' We first reformulate the assertion. 
Since are exchangeable, { 5 (x^j,)}^j^ are exchange¬ 

able (Property 1.1.2 in ID). Let = g{x'j^),l = 1,...,N. 
Then the premises of Theorem [T] are equivalent to 


{ydili are exchangeable and p^in < Yz < flmax- (14) 
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Let y = rjf.. According to (|9]l, ( fTOl i and (fTTT i. y has the 
properties that 

f 1 " 

= (15) 

[ E(y) = E(y,) for any 1. (16) 

Since p is a general function, there is no other restrictions for 

{y;}^i and y. Therefore, (fT^ is equivalent to the following 
assertion: 

For any {yj^^ and y satisfying (fT4l i. ( fTSl l and ( fThl l. 

y and X];=i Vi independent for any finite N. 

In particular, y is independent of y^ for any 1. 

(17) 

However, we use the following counterexample (modified 
from Example 1.1.1 and related discussions on pages 11-12 
in IjSl) to show that assertion ( fTTl i is not true. Therefore (fTZt 
is not true. 

2) Counterexample: Let be a sequence of i.i.d. 

random variables satisfying 


^max 5min 

4 


< z; < 


5^max 9m[n 

4 


and E(z/) = 0 


for all 1. Let y be a random variable independent of 
satisfying 


and 


E(y) = 


Finally, let y; = z; + y for all 1. 

It can easily be verified that and y satisfy (O and 

(HSll. Since z; is bounded, E(|zi|) < oo for any 1. By the strong 
law of large numbers (SLLN) for i.i.d. random variables. 


1 

N 




—0 a.s. as (V —>■ oo. 


Therefore (fTsT i is also satisfied, i.e. y is the limit of ^ 
as N ^ oo. However, because ^ = y ^ 2 ^ 

and y is independent of it can be seen that -E y^ 

is not independent of y except for some degenerate cases (for 
example when y equals to a constant). In particular, in general 
y; = y + z; is not independent of y for any 1. Therefore, 
assertion (fTTl i is not true. Equivalently, assertion (fT^ is not 
true. 

3) Further Analysis: In IT] the authors intended to prove 
Theorem [U or equivalently (fTsT l. As shown by the counterex¬ 
ample, assertion (O is not true. This renders the authors’ 
proof for (fljT l invalid. 

In the following, we carry out further analysis to show that 
(fTTl i cannot be true even considering other methods of proof 
and adding new sufficient conditions. Therefore, in general. 
Theorem [T] cannot be true. 

To begin with, consider the random variable ^ . We prove 

'^k 

the following lemma. 

Lemma 1. E ^ ^ E as N ^ oo. 


Proof. According to ( fTOl ). rji.. Since < Vk — 

5max, 0 < 5min < Vk < ffmax almost surely. 

Since h{x) = j is continuous on (0, oo), we have 

^iVk) = “ (Proposition 47.2 in l9ll). 

Vk Vk 

Then we have 

(Proposition 47.4 (ii) in 0). 

Vk Vk 

Finally, by the conditions in Theorem [H 0 < < 

By the Lebesgue’s Dominated Convergence Theorem 

(Proposition 11.30 in 0), we have E E 

as N —>■ oo. 


Now by Lemma [T] (fTTl i is equivalent to 
g f ^ E[4(a:)] 

\ Vk J ^[Vk] 


(A) 


Now it is clear that if the only assumption is exchangeabil¬ 
ity, (A) is not true even considering other methods of proof. 
Of course, if (fTTl i is true, and rjj, are independent, then 

S is true. However, as already shown by the counterexample, 
is not true in general. Therefore, and equivalently 
Theorem [T1 are in general not true. 

A natural question then arises: is it possible to introduce 
some reasonable sufficient conditions such that (A) can be 
proved? One of such conditions frequently used is that rp. = 
E[p(x^)], i.e. T]^ converges to its expectation, a constant 
which equals E[p(x|,)] for any j. However, the following 
analysis shows that given the modeling assumption of ex¬ 
changeability, this condition is not true in general. Therefore 
it cannot be introduced. 

For exchangeable random variables we have 


V(T7fe) = lim y{Vk) 

N^oo 


(18) 


: lim < E 

N—^OO 


1 


N 

E 5(4) 
1^1 

N 


N 


1 ^ 

[ N ~] 



E 5(xi) 


_ 

e'=i 

[ 


N 




y 




. 1^1 


N N 

E E ^ 5(4),5(4) 

V[5(4)] N-1 


= lim 

N—¥<X) 


[5(x^),5(x^)] 


= C [5(x^),5(x|)] , 


(19) 

( 20 ) 


where V (x) is the variance of x and C(x, y) is the covariance 
of X and y. ( fTSl l is by the boundedness of and the 
Lebesgue’s Dominated Convergence Theorem, (fTOl l is by the 
exchangeability of and (fTOl i is by the boundedness 

of g and pushing N to infinity. Now it is clear that if the only 
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TABLE I 

Population dynamics of EA„ under operator H 


EA„ 

k 

0 

1 

2 

EAi 


pi 

■^0 

Hi(Pj) 

Hi(Hi(pi)) 

EAn 


■pn 

■^0 

Hn(P5) 

H„(H„(P5)) 

EAoo 


poo 

■^0 

Hoo(PS°) 

Hoo(Hoo(P5“)) ... 


modeling assumption is exchangeability, there is no guarantee 
that C [(/(x^), g(x^)] = 0 . Therefore, in general does 
not converge to a constant. Thus this condition cannot be 
introduced as a sufficient condition in order to prove ( [A] ). 

4) Summary: As the analyses in this section show, the 
transition equation © does not hold under the modeling 
assumption of exchangeability. However, it does not preclude 
the possibility of enhancing the modeling assumption so that 
it can yield analytical results similar to the transition equation 
©. We deal with this issue by adopting the “stronger” i.i.d. 
assumption when building IPMs. However, before presenting 
our framework and analyses, we show why the proofs in both 
H] and I© are incomplete. 

C. The Issue of the Stacking of Operators and Iterating the 
Algorithm 

In the following, we discuss IPMs from another perspective. 
Consider an EA with only one operator. Let the operator be 
denoted by H. When the population size is n, denote this EA 
by EA„ and the operator it actually uses by H„. Let = 
denote the kth generation produced by EA„. Then 
the transition rules between consecutive generations produced 
by EA„ can be described by = H„(P^). In Table HI 

we write down the population dynamics of EA„. Each row in 
Table U shows the population dynamics produced by EA„. In 
the table P^ is expanded as [H„]^(Pg). Let EAoo denote the 
IPM, and P^ = [Hoo]^(P^) denote the populations predicted 
by EAoo- Then we can summarize the results in IT] in the 
following way. 

Let H represent the combined operator of proportionate se¬ 
lection and mutation. Though the authors originally developed 
the transition equation from the A:th to the (fc+l)th generation, 
without loss of generality we can consider only the populations 
from the initial generation to the onward ones. Assume that the 
initial population comes from a known sequence of individuals, 
represented by Pg = For EA„, its initial population 

Pg consists of the first n elements of Pg, i.e. Pg = 

Let PJ^ = Pg. This setting represents the fact that EA„ uses 
the same initial population, and EAoo knows this exact initial 
population. The aim of EAoo is to predict the subsequent 
populations. Considering that Pg and P^ are all from Pg, 
if we redefine H„ to be operators on Pg which only takes 
the first n elements to produce the next generation, then the 
authors essentially proved that 

H„(Po) ^^^Hoo(Po) as n^oo, (21) 


where m.p.w. stands for point-wise convergence of marginal 
p.d.f.s. 

However, apart from the fact that this proof is problematic, 
the authors’ proof covers only one iteration step, correspond¬ 
ing to the column-wise convergence of the k — 1 column 
in Table |I] The problem is that even if (l2Tl l is true, it does 
not automatically lead to the conclusion that for the arbitrary 
fcth step, [H„]^(Po) [Hoo]^(Po) as n —>■ oo. In other 

words, one has to study whether the transition equation for 
one step can be iterated recursively to predict the populations 
after multiple steps. In Table |I] this problem corresponds to 
whether other columns have similar column-wise convergence 
property when the convergence of the k = 1 column is proved. 

To give an example, consider the column of fc = 2 in Table|I] 
To prove column-wise convergence, the authors need to prove 
that given d^ . 

H^(p„) Hoo(Pr), or equivalently (22) 

[H„]2(Pg) [H^]2 (Po) (23) 

as n —oo. Comparing (|2TI) with (l22l l and (|2^ . (l22l l has the 
same sequence of operators but with a sequence of converging 
inputs, (|2^ has the same input but with a sequence of different 
operators. Therefore, they are not necessarily true even if (|2TI) 
is proved. In fact, different techniques may have to be adopted 
to prove (I 2 TI 1 and (l22l) . or equivalently (l2lT i and ( |2^ . Similar 
problem exists when considering the arbitrary fcth generation. 
We call this problem the issue of iterating the algorithm. As 
Qi et al. in both HI, ||2l ignored this issue, we believe their 
proofs are incomplete. 

The issue of the stacking of operators is similar. Given some 
operator H satisfying (I 2 TI 1 and some operator G satisfying 

G„(Po) ^^G^(Po) 

as n ^ 00 , it is not necessarily true that 

H„(G„(Po)) Hoo(Goo(Po)) 

as n —7> 00 . However, the authors in 12 totally ignored this 
issue and combined the transition equations for selection, 
mutation and crossover together (in Section III of 12) without 
any justification. 

In addition, there are several statements in the authors’ 
proofs in 12 that are questionable. First, in the first paragraph 
of Appendix A (the proof for Theorem 1 in that paper), the 
authors considered a pair of parents x^ and xj. for the uniform 
crossover operator, x^ and xj, are “drawn from the population 
independently with the same density of /xj, = /x'^, ”• Then, 
the authors claimed that “the joint density of x^ and xj, is 
therefore /xj, • This is simply not true. Two individuals 
drawn independently from the same population are condi¬ 
tionally independent, they are not necessarily independent, 
unless the modeling assumption is that all individuals in the 
same population are independent. In fact, without the i.i.d. 
assumption, it is very likely that individuals in the same 
population are dependent. Therefore, the joint density function 
of Xfe and x^ is not necessarily /xj, ■ /x^, and the authors’ 
proof for Theorem 1 in 12 is dubious at best. On the other 
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hand, even if the authors’ modeling assumption is indepen¬ 
dence of individuals for the uniform crossover operator, this 
assumption is incompatible with the modeling assumption 
of exchangeability in 12 for the operators of selection and 
mutation. Therefore, combining the transition equations for all 
these three operators is problematic, because the assumption 
of independence cannot hold beyond one iteration step. 

Another issue in 12 is that the uniform crossover operator 
produces two dependent offspring at the same time. As a 
result, after uniform crossover, the intermediate population is 
not even exchangeable because it has pair-wise dependency 
between individuals. Then the same problem arises, that is the 
transition equation for the uniform crossover operator cannot 
be combined with the transition equations for selection and 
mutation. This is because the uniform crossover operator pro¬ 
duces intermediate populations without exchangeability, but 
this property is required for modeling selection and mutation. 
Besides, the transition equation for the uniform crossover 
operator cannot be iterated beyond one step. This is because 
regardless of independence or exchangeability as its modeling 
assumption, this assumption will surely be corrupted beyond 
one iteration step. 

In summary, several issues arise from previous studies on 
IPMs for EAs on continuous optimization problems. There¬ 
fore, new frameworks and proof methods are needed for 
analyzing the convergence of IPMs and studying the issue of 
the stacking of operators and iterating the algorithm. 

III. Proposed Framework 

In this section, we present our proposed analytical frame¬ 
work. In constructing the framework we strive to achieve the 
following three goals. 

1) The framework should be general enough to cover real 
world operators and to characterize the evolutionary 
process of real EA. 

2) The framework should be able to define the convergence 
of IPMs and serve as justifications of using them. The 
definition should match people’s intuitions and at the 
same time be mathematically rigorous. 

3) The framework should provide an infrastructure to study 
the issue of the stacking of operators and iterating the 
algorithm. 

The contents of this section roughly reflects the pursuit 
of the first two goals. The third goal is reflected in the 
analyses of the simple EA in Section |IV] More specifically, 
in Section IIII-AI we introduce notations and preliminaries for 
the remainder of this paper. In Section IIII-BI we present our 
framework. In the framework, each generation is modeled by a 
random sequence. This approach unifies the spaces of random 
elements modeling populations of different sizes. In Section 
IIITCl we define the convergence of the IPM as convergence in 
distribution on the space of random sequences. We summarize 
and discuss our framework in Section IIII-DI 

To appreciate the significance of our framework, it is 
worth reviewing the methodology in 12, 12 studying the 
convergence of IPMs. Implicitly, the authors in 12, 12 used 
point-wise convergence of marginal p.d.f. as the criteria of 


defining the convergence of IPMs. Apart from the proofs being 
problematic and incomplete, this definition does not consider 
the joint distribution of individuals of the population. Thus, it 
loses information and cannot characterize the dynamics of the 
whole population. Besides, point-wise convergence of p.d.f.s 
depends on the existence and the explicit forms of the p.d.f.s. 
This fact limits the generality of the methodology. In addition, 
compared with convergence in distribution used in this paper, 
the criteria of point-wise convergence is unnecessarily strict. 
In essence, the core of the criteria should characterize the 
similarity between distributions of random elements. In this 
regard, convergence in distribution matches the intuition and 
suffices for the task. A stronger criteria, such as point-wise 
convergence, will inevitably increase the difficulty in analysis. 
Finally, in this paper we separate the framework (the definition 
of the convergence of IPMs) from the analyses of operators. 
The organization is logical and general. 

A. Notations and Preliminaries 

In the remainder of this paper we focus on the unconstrained 
continuous optimization problem 

argmax 5 (x) s.t. x G (24) 

X 

where g is some given objective function. Our framework is 
general enough such that it does not require other conditions 
on the objective function g. However, to prove the convergence 
of IPMs for mutation and recombination, conditions such as 
those in Theorem [T] are sometimes needed. We will introduce 
them when they are required. 

From now on we use N to denote the set of nonnegative 
integers and N+ the set of positive integers. For any two real 
numbers a and b, let a A 6 be the smaller one of them and 
a V & be the larger one of them. Let x, y be random elements 
of some measurable space (fl, J^). We use £(x) to represent 
the law of x. If x and y follow the same law, i.e. P(x € A) = 
P(y G A) for every A G J”, we write £(x) = C{y). Note 
that £(x) = C{y) and x = y have different meanings. In 
particular, x = y indicates dependency between x and y. 

We use the notation {xi)f^^ to represent the array 
{ xm , Xm + i ,---, Xn )- When n = oo, ( xi )^^ represents 
the infinite sequence {xm,Xm+i, ■ ■ ■)■ We use {xi}2^^ and 
represent the collections {xm,Xm+i, ■ ■ ■, Xn} and 
{xi\i = m,m + 1,...}, respectively. When the range is clear, 
we use {xi)i and {xi}i or (xi) and {xi} for short. 

Let § denote the solution space This simplifies our 
notation system when we discuss the spaces S” and §°°. In 
the following, we define metrics and u-fields on S, §" and 
and state properties of the corresponding measurable spaces. 

§ is equipped with the ordinary metric p{x, y) = 
[Y^f^iixi — 2 /i)^]^. Let S denote the Borel a-field on S 
generated by the open sets under p. Together (§,5) defines a 
measurable space. 

Similarly, S” is equipped with the metric pn(x,y) = 
p^{xi, yi)]^, and the corresponding Borel tr-field under 
Pn is denoted by <S'". Together is the measurable 

space for n tuples. 
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Next, consider the space of infinite sequences = 

{{xi,X 2 ,...) I € S, i G N+}. It is equipped with the metric 


Poc{x,y) 



2=1 


p{xi,yi) 

1 + p{xi,yi)' 


The Borel cr-field on under is denoted by 5'°°. Then 
(S”,iS'°“) is the measurable space for infinite sequences. 

Since S is separable and complete, it can be proved that 
S” and are also separable and complete (Appendix M6 
in flOl ). In addition, because of separability, the Borel cr- 
fields 5'” and 5'°° are equal to 5" and S°°, respectively. 
In other words, the Borel tr-fields 5'" and <S'°“ generated by 
the collection of open sets under the corresponding metrics 
coincide with the product cr-fields generated by all measur¬ 
able rectangles (5") and all measurable cylinder sets (5°°), 
respectively (Lemma 1 .2 in mi). Therefore, from now on we 
write 5" and for the corresponding Borel cr-fields. Finally, 
let M, M" and M°° denote the set of all random elements of 
S, S” and S°“, respectively. 

Let 7r„ : —>■ §" be the natural projection: TTn{x) = 

{xi,X 2 , ■ ■ ■, Xn)- Since given x G M°°, (7r„ ox) : ^ S” 

defines a random element of §" projected from §°°, we also 
use 7r„ to denote the mapping: 7r„ : M°° M" where 
7r„(x) = (xi, X 2 ,..., x„). By definition, 7r„ is the operator 
which truncates random sequences to random vectors. Given 
A C §°°, we use 7r„(A) to denote the projection of A, i.e. 
7r„(A) = {cc G S" : a; = 7r„(?/) for some y G A}. 


B. Analytical Framework for EA and IPMs 

In this section, we present an analytical framework for the 
EA and IPMs. First, the modeling assumptions are stated. We 
only deal with operators which generate c.i.i.d. individuals. 
Then, we present an abstraction of the EA and IPMs. This 
abstraction serves as the basis for building our framework. 
Finally, the framework is presented. It unifies the range spaces 
of the random elements and defines the convergence of IPMs. 

1) Modeling Assumptions: We assume that the EA on the 
problem (l24li is time homogeneous and Markovian, such that 
the next generation depends only on the current one, and 
the transition rule from the A:th generation to the (fc + l)th 
generation is invariant with respect to A: G N. We further 
assume that individuals in the next generation are c.i.i.d. given 
the current generation. As this assumption is the only extra 
assumption introduced in the framework, it may need some 
further explanation. 

The main reason for introducing this assumption is to 
simplify the analysis. Conditional independence implies ex¬ 
changeability, therefore individuals in the same generation 
k G N+ are always exchangeable. As a result, it is possible to 
exploit the symmetry in the population and study the transition 
equations of marginal distributions. Besides, it is because 
of conditional independence that we can easily expand the 
random elements modeling finite-sized populations to random 
sequences, and therefore define convergence in distribution for 
random elements of the corresponding metric space. In addi¬ 
tion, many real world operators in EAs satisfy this assumption. 


such as the proportionate selection operator and the crossover 
operator analyzed in HI, fj)- 

However, we admit that there are some exceptions to our 
assumption. A most notable one may be the mutation operator, 
though it does not pose significant difficulties. The mutation 
operator perturbs each individual in the current population 
independently, according to a common conditional p.d.f. If 
the current population is not exchangeable, then after mutation 
the resultant population is not exchangeable, either. Therefore, 
it seems that mutation does not produce c.i.i.d. individuals. 
However, considering the fact that mutation is often used 
along with other operators, as long as these other operators 
generate c.i.i.d. populations, the individuals after mutation will 
be c.i.i.d., too. Therefore, a combined operator of mutation 
and any other operator satisfying the c.i.i.d. assumption can 
satisfy our assumption. An example can be seen in fT], where 
mutation is analyzed together with proportionate selection. On 
the other hand, an algorithm which only uses mutation is very 
simple. It can be readily modeled and analyzed without much 
difficulty. 

Perhaps more significant exceptions are operators such as 
selection without replacement, or the crossover operator which 
produces two dependent offspring at the same time. In fact, 
for these operators not satisfying the c.i.i.d. assumption, it 
is still possible to expand the random elements modeling 
finite-sized population to random sequences. For example, the 
random elements can be padded with some fixed constants or 
random elements of known distributions to form the random 
sequences. In this way, our definition of the convergence of 
IPMs can still be applied. However, whether in this scenario 
convergence in distribution for these random sequences can 
still yield meaningful results similar to the transition equation 
is another research problem. It may need further investigation. 
Nonetheless, our assumption is equivalent to the exchangeabil¬ 
ity assumption generally used in previous studies. 

2) The Abstraction of EA and IPMs: Given the modeling 
assumptions, we develop an abstraction to describe the popu¬ 
lation dynamics of the EA and IPMs. 

Let the EA with population size n be denoted by EA„, 
and the A:th (fc G N) generation it produces be modeled as a 
random element = (x^ G M", where x^ ^ G M is a 
random element representing the Ah individual in P^. Without 
loss of generality, assume that the EA has two operators, G 
and H. In each iteration, the EA first employs G on the current 
population to generate an intermediate population, on which 
it then employs H to generate the next population. Notice that 
here G and H are just terms representing the operators in the 
real EA. They facilitate describing the evolutionary process. 
For EA„, G and H are actually instantiated as functions 
from M" to M", denoted by G„ and H„, respectively. For 
example, if G represents proportionate selection, the function 
G„ : M” —>■ M" is the actual operator in EA„ generating n 
c.i.i.d. individuals according to the conditional probability Q. 
Of course, for the above abstraction to be valid, the operators 
used in EA„ should actually produce random elements in M”, 
i.e. the newly generated population should be measurable on 
(S", iS"). As most operators in real EAs satisfy this condition 
and this is the assumption implicitly taken in previous studies. 
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we assume that this condition is automatically satisfied. 

Given these notations, the evolutionary process of EA„ 
can be described by the sequence (Pfc)^o’ where the initial 
population Pq is known and the generation of Pfc,fc € N+ 
follows the recurrence equation 

P^+i = (H„oG„)(P^). (25) 

Then understanding the population dynamics of the EA can be 
achieved by studying the distributions and properties of P^. 

Let the IPM of the EA be denoted by EAqo. The population 
dynamics it produces can be described by the sequence (P“ G 
where P“ is known and the generation of P“, fc S 
N+ follows the recurrence equation 

Pr+i = (Hoo0Goo)(Pr), (26) 

in which Goo, Hoc : —>• M°“ are operators in EAqo 

modeled after G and H. Then, the convergence of EAqo 
basically requires that (Pfc)^i converges to P“ for every 
generation fc. 

3) The Proposed Framework: As stated before, for each 

generation fc € N, the elements of the sequence (P^, P^,...) 
and the limit P“ are all random elements of different metric 
spaces. Therefore, the core of developing our model is to 
expand P^ to random sequences, while ensuring that this 
expansion will not affect modeling the evolutionary process 
of the real EA. The result of this step is the sequence of 
random sequences (QJ! G for each n G N+, which 

completely describes the population dynamics of EA„. Eor the 
population dynamics of EAqo, we just let = P“. 

The expansion of P^ and the relationships between P^, 
and Q“ are the core of our framework. In the following, we 
present them rigorously. 

4) The Expansion ofP^: We start by decomposing each of 
G„ and to two operators. One operator is from to 

It corresponds to how to convert random sequences to random 
vectors. A natural choice is the projection operator 7r„. 

To model the evolutionary process, we also have to define 
how to expand random vectors to random sequences. In other 
words, we have to define the expansions of Gn and Hn, which 
are functions from §" to S°°. 

Definition 2 (The expansion of operator). For an operator 
T„ : M" —> M" satisfying the condition that for any x G M”, 
the elements o/T„(x) are c.i.i.d. given x, the expansion of 
T„ is the operator T„ : M" —>■ M°“, satisfying that for any 
X e M"-, 

1) T„(x) = (7r„ o '^)(x). 

2) The elements o/T„(x) are c.i.i.d. given x. 

In Definition |2] the operator T„ is the expansion of T„. 
Condition [T]l ensures that T„ can be safely replaced by 7r„ o 
T„. Condition |2l ensures that the paddings for the sequence 
are generated according to the same conditional probability 
distribution as that used by T„ to generate new individuals. 
In other words, if the operator T„ : M" —M describes how 
T„ generates each new individual from the current population, 
T„ is equivalent to invoking T„ independently on the current 
population for n times, and T„ is equivalent to invoking T„ 


independently for infinite times. Finally, because T„ satisfies 
the condition in the premise, the expansion T„ always exists. 

By Definition |2] the operators in EA„ can be decomposed 
as G„ = TTn o Gn and H„ = 7r„ o H„, respectively. Then, the 
evolutionary process of EA„ can be described by the sequence 
of random sequences [Q^ = ^ satisfying 

the recurrence equation 

Q^+i = (H„0 7r„oG„)(P^), (27) 

where follows the recurrence equation (l25l l, and Qq = 
(Pq , 0, 0,...). It can also be proved that 

Pfe=^n(Qfe)- (28) 

Essentially, (l27l l and (l28l l describe how the algorithm pro¬ 
gresses in the order ..., Q^, P^, P^+i,.... It fully 

characterizes the population dynamics (P^)*, and it is clear 
that the extra step of generating does not introduce 
modeling errors. 

For EAoo, because P^ G M°“, there is no need for 
expansion. For convenience we simply let 

Qr = pr (29) 

for k G N. 

In summary, the relationships between P^, and Q“ are 
better illustrated in Fig. |2] This is the core of our framework 
for modeling the EA and IPMs. For clarity, we also show the 
intermediate populations generated by G (denoted by P'^), 
their expansions (denoted by Q'J!), and their counterparts 
generated by Goo (denoted by Q'^), respectively. How they 
fit in the evolutionary process can be clearly seen in the figure. 

In Fig. m a solid arrow with an operator on it means that 
the item at the arrow head equals the result of applying the 
operator on the item at the arrow tail. For example, from 
the figure it can be read that Q” = H„(P'q). Dashed arrow 
with a question mark on it signals the place to check whether 
convergence in distribution holds. For example, when k = 2, 
it should be checked whether (Q 2 ))(°=i converges to Q“ as 
n —>■ cxD. 

Finally, one distinction needs special notice. For EA^ and 
EA„ (to f n), consider the operators to generate P™ and P^. 
It is clear that Gm ■ M’" —>■ M™ and G„ : M" —>• M" are 
two different operators because their domains and ranges are 
all different. The distinction still exists when we consider Q^, 
though it is more subtle and likely to be ignored. In Fig. |2] if 
we consider the operator G„ = 7r„ o G„ : M°° —M°°, it is 
clear that G„ uses the same mechanism to generate new indi¬ 
viduals as the one used in G„ = G „0 7r„, and = G„(Q^) 
describes the same population dynamics as that generated by 
P'^ = G„(P^). However, if we choose m f n, Gm and 
G„ are both functions from M°° to M°“. Therefore, checking 
domains and ranges are not enough to discern Gm and G„. 
It is important to realize that the distinction between Gm and 
G„ lies in the contents of the functions. Gm and G„ use 
TO and n individuals in the current population to generate 
the new population, respectively, although the new population 
contains infinite number of individuals. In short, EAm and 
EA„ are the EA instantiated with different population sizes. 
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Fig. 2. Relationships between and Q“ 


Mathematically, the corresponding population dynamics are 
modeled by stochastic processes involving different operators, 
even though their domains and ranges may be the same. The 
same conclusion also holds for the operator H. 

C. Convergence of IP Ms 

Given the framework modeling the EA and IPMs, first, 
we define convergence in distribution for random elements 
of S°°. This is standard material. Then, the convergence of 
IPMs is defined by requiring that the sequence (Q^, Q^,...) 
converges to Q“ for every k CN. 

1) Convergence in Distribution: As Qf are random el¬ 
ements of S°°, in the following we define convergence in 
distribution for sequences of §°°-valued random elements. 
Convergence in distribution is equivalent to weak convergence 
of induced probability measures of the random elements. We 
use the former theory because when modeling individuals 
and populations as random elements, the former theory is 
more intuitive and straightforward. The following materials 
are standard. They contain the definition of convergence in 
distribution for random elements, as well as some useful 
definitions and theorems which are used in our analysis of 
the simple EA. Most of the materials are collected from the 
theorems and examples in Sections 1-3 of Eol. The definition 
of Prokhorov metric is collected from Section 11.3 in m. 

Let X, y,x„,n G N+ be random elements defined on 
a hidden probability space P) taking values in some 

separable metric space T. T is coupled with the Borel cr-field 
T. Let (T',T') be a separable measurable space other than 

(T,r). 

Definition 3 (Convergence in distribution). If the sequence 
(xra)5^i satisfies the condition that E [/i(x„)] —^ E [(i(x)] 
for every bounded, continuous function /i : T —^ ffi., we say 
(xra)5^i converges in distribution to x, and write x„ -G x. 

Eor e > 0, let = {y € T : d{x, y) < e for some x G 
A}. Then it is well known that convergence in distribution on 
separable metric spaces can be metricized by the Prokhorov 
metric. 

Definition 4 (Prokhorov metric). For two random elements x 
and y, the Prokhorov metric is defined as 

Pd(x, y) = inf{e > 0 : P(x G A) < P(y e A’') -f e, VA € T}. 

Call a set A in T an x-continuity set if P(x G 9A) = 0, 
where dA is the boundary set of A. 


Theorem 2 (The Portmanteau theorem). The following state¬ 
ments are equivalent. 

IX d 

1) X„ X. 

2) limsup„P(x„ G F) < P(x G F) for all closed set 

FgT. 

3) liminfn P(x„ G G) > P(x G G) for all open G G T. 

4) P(x„ C A) —^ P(x G A) for all x-continuity set A G F. 

Theorem 3 (The mapping theorem). Suppose h : (T, T) —> 
(T'jTO A d measurable function. Denote by Dh the set of 
discontinuities of h. IfXn —)■ x and P{Dh) = 0, then h(xn) —> 
h{x). 

Let a, a„ be random elements of T, b, b„ be random 
elements of T', then (a b)^ and (a„ b„)^ are random 
elements of T x T'. Note that T x T' is separable. 

Theorem 4 (Convergence in distribution for product spaces). 
If a is independent of b and an is independent of b„ for all 
n G N+, then (an b„)^ -G (a b)^ if and only if an —>■ a and 

bra b. 

Theorem |4] is adapted from Theorem 2.8 (ii) in ifTOl . 

Let z,Zn,n G N+ be random elements of S°“. 

Theorem 5 (Einite-dimensional convergence). Zn z if and 
only if -Kni(zn) A- TTm(z) for any m G N+. 

Theorem |3 basically asserts that convergence in distribu¬ 
tion for countably infinite dimensional random elements can 
be studied through their finite-dimensional projections. It is 
adapted from Example 1.2 and Example 2.4 in ifTOl . In IfTOl . 
the metric space under consideration is K.°°. However, as both 
R and § are separable, it is not difficult to adapt the proofs for 
to a proof for Theorem |3 Note that iTmiz) are random 
elements defined on (H, P) taking values in and 

P[7rm(z) G A\= P(z gAxSxSx...) for every A G 5™. 
The same is true for TTm{zn). 

2) Convergence of IPM: As convergence in distribution is 
properly defined, we can use the theory to define convergence 
of IPMs. The idea is that IPM is convergent (thus justifiable) if 
and only if it can predict the limit distribution of the population 
dynamics of EA„ for every generation fc G N as the population 
size n goes to infinity. It captures the limiting behaviors of real 
EAs. 
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Definition 5 (Convergence of IPMs), An infinite population 
model EAoo is convergent if and only if for every k G N, 
Qfe ^ QT as n —>■ oo, where Qr and the underling 
P^, P“ are generated according to A27[ , A29[ , f l25l ) and ( I26I ). 

Definition |5] is essentially the core of our proposed frame¬ 
work. It defines the convergence of IPM and is rigorous and 
clear. 

D. Summary 

In this section, we built a framework to analyze the conver¬ 
gence of IPMs. The most significant feature of the framework 
is that we model the populations as random sequences, thereby 
unifying the ranges of the random elements in a common 
metric space. Then, we gave a rigorous definition for the 
convergence of IPMs based on the theory of convergence in 
distribution. 

Our framework is general. It only requires that operators 
produce c.i.i.d. individuals. In fact, any EA and IPM satisfying 
this assumption can be put into the framework. However, to 
obtain meaningful results, the convergence of IPMs has to be 
proved. This may require extra analyses on IPM and the inner 
mechanisms of the operators. These analyses are presented in 
Section |IV] 

Finally, there is one thing worth discussing. In our frame¬ 
work, the expansion of operator is carried out by padding the 
finite population with c.i.i.d. individuals following the same 
marginal distribution. Then a question naturally arises: why 
not pad the finite population with some other random elements, 
or just with the constant 0? This idea deserves consideration. 
After all, if the expansion is conducted by padding Os, the 
requirement of c.i.i.d. can be discarded, and the framework 
and the convergence of IPMs stay the same. However, we 
did not choose this approach. The reason is that padding the 
population with c.i.i.d. individuals facilities analysis of the 
IPM. For example, in our analysis in Section |IV] the sufficient 
conditions for the convergence of IPMs require us to consider 
rm(Qfc), where F is the operator under analysis. F^ uses the 
first m elements of to generate new individuals. Now if 
m> n and is expanded from by padding Os, Fm(Qfe) 
does not make any sense because the m individuals used by 
Tm have [m — n) Os. This restricts our option in proving the 
convergence of IPMs. 

IV. Analysis of the Simple EA 

In this section, we analyze the simple EA using our frame¬ 
work. In Section IIV-AI we give sufficient conditions for the 
convergence of IPMs. To appreciate the necessity, consider the 
framework in Fig. |2] To prove the convergence of IPM, by 
Definition 13 we should check whether as n —)• oo 

for every fc G N. However, this direct approach is usually not 
viable. To manually check the convergence for all values of 
k is wearisome and sometimes difficult. This is because as k 
increases, the distributions of Qj! and change. Therefore, 
the method needed to prove Q“ as n —> oo may be 

different from the method needed to prove —)■ Q^i 

as n ^ oo. Of course, after proving the cases for several 


values of k, it may be possible to discover some patterns in 
the proofs, which can be extended to cover other values of k, 
thus proving the convergence of the IPM. But this process is 
still tedious and uncertain. 

In view of this, a “smarter” way to prove the convergence 
of IPM may be the following method. First, the convergence 
of IPM for one iteration step for each operator is proved. 
Then, the results are combined and extended to cover the 
whole population dynamics. The idea is that if the convergence 
holds for one generation number k, then it can be passed on 
automatically to all subsequent generations. For example, in 
Fig. 12] consider the operators Goo and G„ o 7r„. The first step 
is to prove that 

if as n ^ cx),then Q'^ as n —>■ cx). 

(30) 

In other words. Goo can model G„ o 7r„ for one iteration step. 
Then, after obtaining similar results for Hqo and H„ o 7r„, we 
combine the results together and the convergence of the overall 
IPM is proved. 

However, this approach still seems difficult because we 
have to prove this pass-on relation (|30] | holds for every k. 
In essence, this corresponds to whether the operators in IPM 
can be stacked together and iterated for any number of steps. 
This is the issue of the stacking of operators and iterating 
the algorithm. Therefore, in Section IIV-AI we give sufficient 
conditions for this to hold. These conditions are important. If 
they hold, proving the convergence of the overall IPM can be 
broken down to proving the convergence of one iteration step 
of each operator in IPM. This greatly reduces the difficulty in 
deriving the proof. 

To model real EAs, IPM has to be constructed reasonably. 
As shown in Section m exchangeability cannot yield the 
transition equation for the simple EA. This creates the research 
problem of finding a suitable modeling assumption to derive 
IPM. Therefore, in Section IIV-BI we discuss the issue and 
propose to use i.i.d. as the modeling assumption in IPM. 

Then, we use the sufficient conditions to prove the conver¬ 
gence of IPMs for various operators. The operators of mutation 
and fc-ary recombination are readily analyzed in Section IIV-CI 
and Section ITV-DI respectively. In Section lTV-EI we summarize 
this section and discuss our results. 

A. Sufficient Conditions for Convergence of IPMs 

To derive sufficient conditions for the convergence of the 
overall IPM, the core step is to derive conditions under which 
the operators in the IPM can be stacked and iterated. 

As before, let EA„ and EAqo denote the EA with popu¬ 
lation size n and the IPM under analysis, respectively. Fet 
F be an operator in the EA, and F„ : M°° —> M°° and 
Foo : M°° —> M°° be its corresponding expanded operators in 
EA„ and EAoo, respectively. Note that F„ and Foo generate 
random elements of S°°. To give an example, F„ and Foo may 
correspond to 7r„ o G„ and Goo in Fig. |2| respectively. 

We define a property under which Foo can be stacked with 
some other operator T'oo satisfying the same property without 
affecting the convergence of the overall IPM. In other words, 
for an EA using T' and F as its operators, we can prove the 














IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 


12 


convergence of IPM by studying and F separately. We call 
this property “the stacking property”. It is worth noting that if 
<1) = r, then this property guarantees that Too can be iterated 
for any number of times. Therefore it also resolves the issue 
of iterating the algorithm. 

Let Aq be random elements in M°“ for a € N+ U {oo}. 
We have the following results. 

Definition 6 (The stacking property). Given U C M°°, if 
for any converging sequence A„ Aqo G U, r„(An,) 
roo(Aoo) G U ai n — >■ c» always holds, then we say that Too 
has the stacking property on U. 

Theorem 6. If 'koo <^nd Too have the stacking property on U, 
then rkoo o Fco has the stacking property on U. 

Proof. For any converging sequence A„ A- Aoo G U C 
M°°, because Too has the stacking property on U, we have 
r„(A„) A roo(Aoo) G U. Then, (r„(A„))„ is also a 
converging sequence. Since Tioo has the stacking property on 
U, then by definition we immediately have (rkn or„)(A„) A- 
(^'ooOroo)(Aoo) G U. □ 

By Theorem |6] any composition of rkoo and Too has the 
stacking property on U. In particular, (Foo)™ has the stacking 
property on U. The stacking property essentially guarantees 
that the convergence on U can be passed on to subsequent 
generations. 

Theorem 7 (Sufficient condition 1). For an EA consisting of 
a single operator F, let F be modeled by Foo in the IPM EAqo 
and Foo have the stacking property on some space U C M°°. 
If the initial populations of both EA and EA^o follow the same 
distribution Px for some X G U, then EAqo converges. 

Proof. Note that for EA„ and EAqo, the A:th populations 
they generate are (F„)''(X) and (Foo)^(X), respectively. By 
Theorem |6l (Foo)^ has the stacking property on U. Because 
the sequence (X, X,...) converges to X G U, by Definition 
0 (F„)^(X) A (Foo)^(X) G U as n —>■ cx). Since this holds 
for any k G N, hy Definition 0 EAoo converges. □ 

By Theorem 0 and Theorem |7] we can prove the conver¬ 
gence of the overall IPM by proving that the operators in 
the IPM have the stacking property. Comparing with (l30l) . it 
is clear that the stacking property is a sufficient condition. 
This is because the stacking property requires that (F„(A„))„ 
converges to a point in U for any converging sequence (A„)„ 
satisfying (A„)„ —>■ Aoo G U, while (l30l l only requires 
the convergence to hold for the specific converging sequence 
(Qfc)ra- Since (Qfc)n is generated by the algorithm, it may 
have special characteristics regarding converging rate, distri¬ 
butions, etc. On the other hand, checking the stacking property 
may be easier than proving (l30l l. This is because the stacking 
property is independent of the generation number k. 

Another point worth discussing is the introduction of U in 
Definition 0 Of course, if we omit U (or equivalently let U = 
M°“), the stacking property will become “stronger” because 
if it holds, the convergence of the IPM is proved for the EA 
starting from any initial population. However, in that case the 


condition is so restricted that the stacking property cannot be 
proved for many operators. 

In Definition0 it is required that F„(A„) Foo (Aoo) G U 
as n —>■ oo. The sequence under investigation is (F„(A„))„, 
which is a sequence of changing operators (F„)ji on a se¬ 
quence of changing inputs (A„)„. As both the operators and 
the inputs change, the convergence of (F„(A„))„ may still 
be difficult to prove. Therefore, in the following, we further 
derive two sufficient conditions for the stacking property. 

First, let Bq ,_/3 = F^(Aa), where a,/3 G N+ U {oo}. Then, 
we have the following sufficient conditions for the stacking 
property. 

Theorem 8 (Sufficient condition 2). For a space U and all 
converging sequences An —> Aqo G U, if the following two 
conditions 

1) 3M G N+, such that for all m > M, Boo.m 

uniformly as n ^ oo, i.e. sup Pd(B„^m, Boo,m) ^ 0 

m>M 

as n ^ OO, 

2) ^00,?Tl ^ ^CSO,CXD ^ U ^ CXD, 

are both met, then Fqo has the stacking property on U. 

Theorem 9 (Sufficient condition 3). For a space U and all 
converging sequences A„ Aqo G U, if the following two 
conditions 

1) 3N G N+, such that for all n > N, Bn,m B„^oo 
uniformly as m ^ oo, i.e. sup Pd0^n,Tn,^n,oo) 0 

n>N 

as m ^ OO, 

2) ^ Boo,oo ^ U as Ti — y oo, 

are both met, then Foo has the stacking property on U. 

In the following, we prove Theorem 0 Since Theorem 0 
and Theorem 0 are symmetric in m and n, proving one of 
them leads to the other. Recall that pd is the Prokhorov metric 
(Definition |4]i and V gets the maximal in the expression. 

Proof. Ve > 0, by condition 0 in Theorem 0 3N s.t. 
sup Pd(B„,m, Boo.m) < fof ^11 n > A. By condition0in 

m>M _ __ 

Theorem0 3M s.t. Pd(Boo,TO,Boo,oo) < for all m> M. 
Now for alH > M V A V M, 

Pd(B/^;, Bqo^oo) ^ Pd(B;^/, Boo,l) “t“ Pd(Boo,/; Boo,oo) 


Therefore, B„_„ ■^^00,00 ^ ^ oo. I I 

To understand these two theorems, consider the relation¬ 
ships between A^ and Bq ^3 illustrated by Fig.0 In the figure, 
the solid arrow represents the premise in Definition 0 i.e. 
An Aoo G U as n — 00 . The double line arrow represents 
the direction to be proved for the stacking property on U, i.e. 
B„_„ Boo.00 G U as n 00 . The dashed arrows are 

the directions to be checked for Theorem 0 to hold. The wavy 
arrows are the directions to be checked for Theorem0to hold. 

Now it is clear that Theorem 0 and Theorem 0 bring 
benefits. For example, for Theorem 0 instead of proving the 
convergence for a sequence generated by changing operators 


IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 


13 


ri r2 
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oo 



Fig. 3. Relationships between Aq, and 


and inputs A Bqo.oo), this sufficient condition con¬ 

siders the convergence of sequences generated by the same 
operator on changing inputs (Bn,m Boo,m) and of the 
sequence generated by changing operators on the same input 

(Boo,m Boo,oo)' 

The reason we introduce M and N in Theorem 0 and 
Theorem |9] respectively is to exclude some of the starting 
columns and rows in Fig. [3 if necessary. This is useful in 
proving the convergence of the IPM of the fc-ary recombination 
operator. 

B. The LTD. Assumption 

In this section, we address the issue of how to construct 
IPM. This issue also corresponds to how to choose the space 
U for the stacking property. 

Before introducing the i.i.d. assumption, let us give an 
example. Consider the space U = {x G M°“|P[x = 
(c, c,...)] = 1 for some c G §}. If the initial population 
follows some distribution from U, then the population consists 
of all identical individuals. If an EA with proportionate selec¬ 
tion and crossover operates on this initial population, then all 
subsequent populations stay the same as the initial population. 
An IPM of this EA can be easily constructed, and it can be 


easily proved that the stacking property holds as long as the 
EA chooses its initial population from U. However, this is not 
a very interesting case. This is because U is too small to model 
real EAs. 

On the other hand, if U = {x G M°“|x is exchangeable}, 
U may be too big to derive meaningful results. This can be 
seen from our analysis in Section HI] which shows that under 
exchangeability it is not possible to derive transition equations 
of marginal distributions for the simple EA. 

Therefore, choosing U should strike a balance between the 
capacity and the complexity of the IPM. In the following 
analysis, we choose U to be Ui = {x G M°°|x is i.i.d.}. 
IPMs of EAs are constructed using the i.i.d. assumption, and 
we prove the convergence of the overall IPM by proving that 
the operators in the IPM have the stacking property on Ui. 

We choose Uj for the following reasons. Eirst, in the real 
world, many EAs generate i.i.d. initial populations. Therefore 
this assumption is realistic. Secondly, i.i.d. random elements 
have the same marginal distributions. Therefore IPM can be 
described by transition equations of marginal distributions. 
Einally, there are abundant literature on the converging laws 
and limit theorems of i.i.d. sequences. Therefore, the difficulty 
in constructing IPM can be greatly reduced compared with 
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using other modeling assumptions. 

In the following, we show how to construct IPM under the 
i.i.d. assumption. This process also relates to condition |2] in 
Theorem [8] It essentially describes how the IPM generates 
new populations. 

Let the operator in the EA be T, and the corresponding 
operator in EA„j be E^ : M°“ —>■ M°°. Recall that in 
our framework we only study EAs consisting of c.i.i.d. op¬ 
erators, therefore E^ generates c.i.i.d. outputs by using the 
first m elements of its input. The process that E^ gener¬ 
ates each output can be described by the conditional p.d.f. 

Let a = e M°“ be the input 

and b = = Em(a) be the output, then the distribution 

of b can be completely described by its finite-dimensional 
p.d.f.s 


fTTiCb) (^ 1 ? • ■ • ^Xl) 

//n ft^m (^2 \yt : ■ • ■ : Vm) ‘ J^7r,„(a) (?/l ; ■ • ■ ; Urn) 

(31) 


i=l 


for every I 

To derive the IPM Too for E, consider the case when 
I = 1 and a e Ui in OTT i. Noting that in this case 

m 

■..,ym)= n /ai iyi\ we have 


fb^x) = fr^ix\yi 


n {y^) dj/ 1 ... dy„ 
2 = 1 


(32) 


Now taking m ^ oo, ( l32l i in the limit becomes the 
transition equation describing how Eoo generates each new 
individual. Let the transition equation be 


K = Tr[/aJ, (33) 

and let c = = Eoo(a). Then how Eqo generates I 

individuals can be described by the finite-dimensional p.d.f.s 
of c: 

i 

/^,(c)(xi, ...,xi) = nTr[/ai](a::*) (34) 

i=l 

for every I G N+. Overall, (|T4] | describes the mapping Eqo : 

Ui ^ Ui. 

To better understand the construction, it is important to 
realize that for Eqo both the input and the output are i.i.d. 
In other words, Eoo generates i.i.d. population dynamics to 
simulate the real population dynamics produced by E, only that 
the transition equation in Eoo is derived by mimicking how E 
generates each new individual on i.i.d. inputs and taking the 
population size to infinity. In fact, if the stacking property on 
Ui is proved and the initial population is i.i.d., Eoo will always 
take i.i.d. inputs and produce i.i.d. outputs. The behaviors of 
Eoo on Ui is well-defined. On the other hand, Eoo (A ^ Ui) 
is not defined in the construction. This leaves us freedom. 
We can define Eoo (A ^ Uj) freely to facilitate proving the 
stacking property of Eoo- In particular, B„_oo for n G N+ in 
Fig. [3 can be defined freely to facilitate the analysis. 


In fact, under the i.i.d. assumption, deriving the transition 
equation for most operators is the easy part. The more difficult 
part is to prove the stacking property of Eoo on Uj. To give an 
example, consider the transition equation Q constructed by 
Qi et al. in m, which models the joint effects of proportionate 
selection and mutation. As our analysis in Section [II] shows, 
it does not hold under the assumption of exchangeability. 
However, if the modeling assumption is i.i.d., the transition 
equation can be immediately proved (see our analysis in 
Section Ell. This also applies to the transition equation built 
by the same authors for the uniform crossover operator (in 
Theorem 1 of ||2l), where the transition equation is in fact 
constructed under the i.i.d. assumption. Therefore, in the 
following analyses, we do not refer to the explicit form of 
the transition equation, unless it is needed. We only assume 
that the transition equation is successfully constructed, and it 
has the form (l3Tt which is derived from (l32l l as m —>• cxd. 

The construction of the IPM also relates partly to condition 
12] in Theorem m Comparing with this condition, it can be seen 
that for a successfully constructed Eqo, the following two facts 
are proved in the construction. 

1) djQQ jji ^ Hqo oo tiS 771 t OO. 

2) Boo,oo S Ui. 

Of course, these two facts are not sufficient for this condition 
to hold. One still needs to prove Tioo,m —> Boo.oo as m —t 
OO. In other words, one has to consider convergence of finite 
dimensional distributions. 

Einally, we sometimes use x for xi,... ,xi if I is clear in 
the context. Eor example OTT l can be rewritten as 



{xi\y) ■ U^{B.){y)dy. 


C. Analysis of the Mutation Operator 

Having derived sufficient conditions for the stacking prop¬ 
erty and constructed the IPM, we prove the convergence of the 
IPM of the mutation operator first. Mutation adds an i.i.d. ran¬ 
dom vector to each individual in the population. If the current 
population is A € M°°, then the population after mutation 
satisfies E[B = Em(A)] = £(A -f X) for all m G N+, 
where X G Ui is a random element decided by the mutation 
operator. As the content of the mutation operator does not 
depend on m, we just write E to represent E^. To give an 
example, X may be the sequence (xi, X 2 ,...) with all x^ G M 
mutually independent and x^ ~ N{0,ld) for all i G N+, 
where N{a,B) is the multivariate normal distribution with 
mean a and covariance matrix B, and I^ is the d-dimensional 
identity matrix. Note that every time E is invoked, it generates 
perturbations independently. Eor example, let Ai and A 2 be 
two populations, then we can write E(Ai) = A^ -f X^ for 
i = l,2 satisfying C{Xi) = ^(Xa) = -C(X) and {X,}i=i ,2 
are mutually independent and independent from {Aijj^i 2 . 

Next, consider Eoo- Recall that as an IPM, Eqo simulates 
real population dynamics by taking i.i.d. inputs and producing 
i.i.d. outputs. If the marginal p.d.f.s of A and X are /a and /x, 
respectively, then Eqo (A) generates i.i.d. individuals whose 
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p.d.f.s are /a * /x, where * stands for convolution. Given the and the selected k parents are then 

construction, we can prove the stacking property of Too- follows the probability: 


Theorem 10 (Mutation). Let F be the mutation operator, and 
Too be the corresponding operator in the IPM constructed 
under the i.i.d. assumption, then Too has the stacking property 
on Ui- 

Proof. We use the notations and premises in Theorem|8] Refer 
to Fig. |3] In particular, the sequence (A„) and the limit Aqo 
are given and A„ A Ago S Ui as n —>■ oo. 

Apparently, 


[r™(Aoo)]“=i = [r(A«,),r(Ao,),...] 

r(Aoo) = roo(Aoo) € Ui. 


Therefore, condition |2] in Theorem 0 is satisfied. 

Noting that condition [T] in Theorem is equivalent to 
r(A„) r(Aoo), we prove this condition by proving that 
7ri[r(A„)] A- 7ri[r(Aoo)] for all i G N+. Then by Theorem 
|5l condition [U in Theorem [8] holds. Then, as both conditions 
in Theorem |8] are satisfied, this theorem is proved. 

Now, we prove 7ri[r(A„)] 7ri[r(Aoo)] for all i G N+. 
First, note that r(AQ,) = A^ + for all a € N U {cxd}. 
{Xq G M°°} are i.i.d. and independent from {A^ G M°°}. 
In addition, for every a, £(X„) = /i(X). 

Since £(Xq,) = /i(X), it is apparent that X„ A- Xqo. Then 
by Theorem |5] we have 7ri(X„) A- 7ri(Xoo) and 7ri(A„) A 
‘tti (Aqo )■ 

Consider the product space S* x It is both separable 
and complete. Since 7ri(AQ,) and 7ri(Xa) are independent, by 
Theorem m it follows that 


7ri(An) 

d 

_X 

( Aqo) 

'^i (^n) 

7 

(^ 00 ) 


Note that 


(35) 


7ri[r(A„)] 


[I 


tI ’^^(Aq.) 


h( 


tti ( Aq, ) 
tti (Xq ) 


), 


(36) 


where I is the identity matrix of appropriate dimension and 
: S* x§* —)• S* is a function satisfying h{ ^ ) = [l /] 
Apparently h is continuous. Then by dTSl l, ( |36] | and Theorem 
|3] 7ri[r(A„)] A- 7ri[r(Aoo)] for any i G N+. □ 


In the proof, we concatenate the input (A„) and the ran¬ 
domness (X„) of the mutation operator in a common product 
space, and represent F as a continuous function in that space. 
This technique is also used when analyzing other operators. 


D. Analysis of k-ary Recombination 

Consider the A:-ary recombination operator and denote it 
by F. In EA^, the operator is denoted by F^. F^ works 
as follows. To generate a new individual, it first samples 
k individuals from the current m-sized population randomly 
with replacement. Assume the current population consists of 


P(y» =Xj) = — for all iG{l,...,k},j G m}. 

(37) 

After the k parents are selected, F^, produces a new 
individual x following the formula 

k 

x = ^U,y„ (38) 

where are random elements of (recall that x 

and y, are random elements of S = modeling individuals 
in our framework). {Ui}Ai are also independent of {y^ji, 
and the joint distribution of (Ui)i is decided by the inner 
mechanism of F. Overall, F^ generates the next population 
by repeatedly using this procedure to generate new individuals 
independently. 

Our formulation seems strange at first sight, but it covers 
many real world recombination operators. For example, con¬ 
sider k = 2 and Ui = U2 = ^I. This operator is the crossover 
operator taking the mean of its two parents. On the other hand, 
if A: = 2 and the distributions of Ui and U2 satisfy 

JUi =Diag(si,S2,...,Sd) 

\ U2 = I - Ui 

where Diag constructs a diagonal matrix from its inputs, {s^} 
are i.i.d. random variables taking values in {0,1} satisfying 
P(si = 0) = P(si = 1) = 1/2, then this operator is the 
uniform crossover operator which sets value at each position 
from the two parents with probability 

Consider the IPM Foo- As stated in Section HV-Bl we do 
not give the explicit form of the transition equation in Fqo- 
We assume that the IPM is successfully constructed, and the 
transition equation is derived by taking m —?> cx) in (O. 
The reason for this approach is not only because deriving 
the transition equation is generally easier than proving the 
convergence of the IPM, but also the formulation in (IJTI i 
and (l38] l encompasses many real world A:-ary recombination 
operators. We do not delve into details of the mechanisms 
of these operators and derive a transition equation for each 
one of them. Instead, our approach is general in that as long 
as the IPM is successfully constructed, our analysis on the 
convergence of the IPM can always be applied. 

The following theorem is the primary result of our analysis 
for the A:-ary recombination operator. 

Theorem 11 (Ic-ary recombination). Let F be the k-ary re¬ 
combination operator, and Fqo be the corresponding operator 
in the IPM constructed under the i.i.d. assumption, then Fqo 
has the stacking property on Ui. 

Proof. We use the notations and premises in Theorem|9] Refer 
to Fig. |3] In particular, the sequence (A„) and the limit Aqo 
are given and A„ —S Ui as n —)• 00. 

We prove that 

Tti —> 7ri(B 

00,00 ) 


(39) 
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as n —?> oo for any i S N+. Then by Theorem0 the conclusion 
follows. 

The overall idea to prove ( |39] ) is that we first prove the 
convergence in distribution for the k ■ i selected parents, 
then because the recombination operator is continuous, (IWt 
follows. 

First, we decompose the operator tt^ oT^ : M°“ —> M*. o 
Tm generates the i c.i.i.d. outputs one by one. This generation 
process can also be viewed as first selecting the i groups of 
k parents at once from the first m elements of the input (in 
total the intermediate output is k ■ i parents not necessarily 
distinct), then producing the i outputs one by one by using 
each group of k parents. In the following, we describe this 
process mathematically. 

Consider : M°“ —>• M^'*. Let x = G M°“ 

and y = {yj)j2i = ‘&m(x). Let be described by the 
probability 

P(y, = Xi) = — for all j e {1,.. ., k-i} and ( e {1, ..., m}. 
■' m 

(40) 

In essence, describes how to select the k ■ i parents from 

X. 

Consider T* : * —>• M*. Let 

u = (ui^i, Ui^2j ■ • ■ ) Ul,fc) U2,l7 U2,2, ■ ■ ■ , U2,/c7.j 

Let V = = 41 (u). Let Ti be described by 

k 

Vj = for all j e {1,... ,z} (41) 

in which where {U;} are decided 

by the recombination operator L as in (l38T l. and (Uj are 
independent for different j. In essence, T* describes how to 
generate the i individuals from the k ■ i parents. 

Now it is obvious that o L^ = 'h o Therefore, 

= {-Ki O rm)(Aa) = (Tl O $„)(Aa) (42) 

for all m G N+ and a G N+ U {oo}. 

Next, consider o Too : M°“ ^ M*. Let $00 = T^k-ii we 
prove that 

£[(7r,oroo)(A)] =£[(T-o$^)(A)],VAgUi. (43) 

(l43t is almost obvious because both operators generate i.i.d. 
outputs, and both marginal p.d.f.s of the outputs follow the 
same distribution decided by ih on fc i.i.d. parents from A. In 
other words, Ti o $00 is a model of tt^ o Loo on i.i.d. inputs. 
The outputs they generate on the same i.i.d. input follow the 
same distribution. 

Since Aoo G Uj, by (l43T l. 

£[7ri(Boo.oo) = (7ri0roo)(Aoo)] = £[(4'o $oo)(Aoo)]- (44) 
Then (|39] | is equivalent to 

(^o$„)(A„) A (T'o$^)(Aoo). (45) 

as n —00 for any i G N+. 

To prove (l45l l. we prove the following two conditions. 


1) 3N G N+, such that for all n > N, 
$m(A„) A- <I>oo(A„) uniformly as m —>• 00 , i.e. 
sup pd[4’m(A„),$oo(A„)] ^ 0 as m -5> cx). 

n>N 

2) <l?oo(A„) -G- $oo(Aoo) as n ^ OO and $oo(Aoo) is 

i.i.d. 

These two conditions correspond to the conditions in Theorem 
|9] Since is from M°° to M^ *, we cannot directly apply 
Theorem|9] However, it is easy to extend the proof of Theorem 
|9] to prove that these two conditions lead to $„(A„) 

4“ 00 (Aoo) as n —?> cxD. Then, by dril l it is apparent that ih 
is a continuous function of its input and inner randomness. 
By concatenating the input and the inner randomness using 
the same technique as that used in the proof for Theorem [Tol 
(l45l l can be proved. Then this theorem is proved. 

In the remainder of the proof, we prove conditions [T] and |2] 
These conditions can be understood by replacing the top line 
with in Fig. [3 

Proof of Condition |2} Since $00 = "^k-i ■ —>■ * 

(recall that can be viewed both as a mapping from S°° to 

* and from M°° to $00 is continuous (see Example 

1.2 in iflOl ). Since A„ Aoo, by Theorem |3 $ 00 (A„) A- 
4’oo(Aoo)- Apparently, $ 00 (Aoo) is i.i.d. Therefore condition 
|2]is proved. 

It is worth noting that this simple proof comes partly from 
our extension of Ti o $00 to inputs A ^ Ui. In fact, the only 
requirement for $00 is (l43 l. i.e. ih o $00 should model tt^ oFoo 
on i.i.d. inputs. By defining $00 to be Wk-i, it can take non-i.i.d. 
inputs such as A„. Thus this condition can be proved. In Fig. 
13 this corresponds to our freedom of defining B„_oo, n G N+. 

Proof of Condition Q} To prove condition [T] we first give 
another representation of ^^(Aq), where m > k ■ i and 
a G N+ U { 00 }. This representation is based on the following 
mutually exclusive cases. 

1) The k ■ i parents chosen from Aq by are distinct. 

2) There are duplicates in the k ■ i parents which are chosen 
from Aa by <!>„■ 

Let Sm,a be random variables taking values in {0,1}, with 
probability 

p(m) = P(s,.,^,Q, = 1) 

= P($m chooses k ■ i distint parents from A^) 
m • (to — 1). (m — k ■ i + 1) 

= -^^-A-(46) 

m 

Let Xm,a G * follow the conditional distribution of the 
k ■ i parents when Sm,a = 1, and y^ ^ G * follow the 
conditional distribution of the k ■ i parents when ^ = 0, 
then <l>m(Aa) can be further represented as 

^m{Aa') = Sm,a ' ^m,a T (1 Sm,a) ' Yrn^a' (47) 

For our purpose, it is not necessary to explicitly describe the 
distribution of x^^a and y^ The only useful fact is that by 
exchangeability of Aq, 

£(x„,„) =£[$^(A,)]. (48) 

To put it another way, x^ ^ and $oo(Aq,) both follow the 
same distribution of k ■ i distinct individuals from the current 
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exchangeable population Aq. Also note that are i.i.d. 

random variables. They are independent of and 

Now consider P[$m(A„) e A] for any A € 5^ *. By 
conditioning on whether the k ■ i parents are distinct, we have 

P[$„(A„) e A] 

=p{m) ■ e A) + [1 - p{m)] ■ P(y„_„ e A). 

Then by (l48T l. 

P[$„(A„) e A] - P[$oo(A„) e A] 

= [p{m) - 1] • P[$oo(A„) e A] + [l-p{m)] ■ P(y„,„ G A). 

(49) 

Since p{m), P[$oo(A„) G A] and P{ym,n ^ A) are all less 
than or equal to 1, 

p{m) — 1 

<[p(m) - 1] • P($oo(A„) G A) 

<P[T>„(A„) G A] - P[$oo(A„) G A] 

<[p{m) - 1] • P($oo(A„) G A) + [1 -p(m)] 

<1 -p(m), 

i.e. |P[$„(A„) G A] -P[$oo(A„) G A]| < 1 -p(m) for all 
A. Taking supremum over all A, we have 

sup |P[$„(A„) G A] -P[$oo(A„) G A]| <1 -p{m) 
AeS>=-' 

(50) 

The left hand side of (l50l l is the total variation distance 
between <i>m(A„) and $00 (A„). It is an upper bound of the 
Prokhorov distance (see for its definition and properties). 
Since the bound 1 — p{m) is uniform with respect to n and 
p{m) —>■ 1 as m —>• 00 , we have 

suppd[4’m(A„),$oo(A„)] < 1 -p{m) 0 as m -)• 00 . 

n 

(51) 

This is exactly condition [T] Therefore this theorem is proved. 

Or, if we do not want to use the total variance distance, 
we have the following result for any $oo(Aoo)-continuity set 
A G k 

|P[T>„(A„) G A] -P[$oo(Aoo) G A]I 
S|P[4’n(A„) G A] — P[$oo(A„) G A] I + 
jp[$oo(A„) G A] -P[T>oo(Aoo) G A]I 
<l-p{n) + |P[$oo(A„) G A] -P[$oo(A«,) G A]|. (52) 

Since we already proved $ 00 (A„) A $oo(Aoo), by |4]l in 
Theorem|2] JP[$oo(A„) G A] — P[<l>oo(Aoo) G A]| —0. Then 
apparently (l52l l converges to 0. Noting that A is arbitrary, by 
applying gll in Theorem |2] again, $„(A„) A $^(Aoo) is 
proved. □ 

We give a brief discussion of the proof. In our opinion, the 
most critical step of our proof is decomposing the A:-ary recom¬ 
bination operator to two sub-operators, one is responsible for 
selecting parents ($), the other is responsible for combining 
them (T*). In addition, for parent selection, the sub-operator 
does not use the information of htness values. Rather, it selects 


parents “blindly” according to its own rules (uniform sampling 
with replacement). This makes the operator <1) easier to analyze 
because the way it selects parents does not rely on its input. 
Therefore we can prove uniform convergence in (fSOl l. 

Another point worth mentioning is the choice of Theorem 
|9] in our proof. Though Theorem 0 and Theorem |9] are 
symmetric, the difficulties of proving them are quite different. 
In fact, it is very difficult to prove the uniform convergence 
condition in Theorem!^ 

Finally, our proof can be easily extended to cover k- 
ary recombination operators using uniform sampling without 
replacement to select parents for each offspring. The overall 
proof framework roughly stays the same. 

E. Summary 

In this section, we analyzed the simple EA within the 
proposed framework. As the analysis shows, although the 
convergence of IPM is rigorously defined, actually proving 
the convergence for operators usually takes a lot of effort. We 
derived sufficient conditions under which the convergence of 
IPM is guaranteed, and discussed how IPM is constructed. 
Then we used various techniques to analyze the mutation 
operator and the fc-ary recombination operator. It can be seen 
that although the sufficient conditions can provide general 
directions for the proofs, there are still much details to be 
worked out in order to analyze different operators. 

To appreciate the signihcance of our work, it is worth noting 
that in Q, El the convergence of the IPMs of the mutation 
operator, the uniform crossover operator and the proportionate 
selection operator was not properly proved, and the issue 
of stacking of operators and iterating the algorithm was not 
addressed at all. In this paper, however, we have proved the 
convergence of IPMs of several general operators. Since these 
general operators cover the operators studied in m, El as 
special cases, the convergence of the IPMs of mutation and 
uniform crossover are actually proved in this paper. Besides, 
our proof does not depend on the explicit form of the transition 
equation of the IPM. As long as the IPM is constructed under 
the i.i.d. assumption, our proof is valid. 

As a consequence of our result, consider the explicit form 
of the transition equation for the uniform crossover operator 
derived in Section II in El- As the authors’ proof was 
problematic and incomplete, the derivation of the transition 
equation was not well founded. However, it can be seen that 
the authors’ derivation is in fact equivalent to constructing the 
IPM under the i.i.d. assumption. Since we have already proved 
the convergence of IPM of the /c-ary crossover operator, the 
analysis in regarding the explicit form of the transition 
equation can be retained. 

V. Conclusion and Future Research 

In this paper, we revisited the existing literature on the 
theoretical foundations of IPMs, and proposed an analytical 
framework for IPMs based on convergence in distribution 
for random elements taking values in the metric space of 
infinite sequences. Under the framework, commonly used 
operators such as mutation and recombination were analyzed. 
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Our approach and analyses are new. There are many topics 
worth studying for future research. 

Perhaps the most immediate topic is to analyze the propor¬ 
tionate selection operator in our framework. The reason that 
the mutation operator and the /c-ary recombination operator 
can be readily analyzed is partly because they do not use 
the information of the fitness value. Also to generate a new 
individual, these operators draw information from a fixed 
number of parents. On the other hand, to generate each 
new individual, the proportionate selection operator actually 
gathers and uses fitness values of the whole population. This 
makes analyzing proportionate selection difficult. In fact, we 
have not proven the convergence of the 1PM of proportionate 
selection, though we have obtained the following two partial 
analytical results under two different metrics: the Prokhorov 
metric and the total variation metric. 

Theorem 12 (Analysis under the Prokhorov metric). Let 
r be the combined operator of mutation and proportionate 
selection in the simple EA, and Too be the IPM constructed 
under the i.i.d. assumption with the transition equation 0- 
Assume the objective function g and the conditional p.d.f for 
mutation fw{x\y) satisfy the two conditions in Theorem [7] 
For a, P € N-|- U {c»}, let Aq, be random elements o/S°° and 
An Aoo G Ui, and Bq,_^ = r^(AQ). Then the following 
statements are true. 

1) ^n,m t Boo.m tl ^ OO. 

Boo,m ^ Boo,oo ^ Uj as TTl ^ CX). 

Comparing with Theorem [8] it can be seen that condition |2] 
in Theorem[8]is proved. The only difference is that condition[T] 
in the theorem requiring the uniform convergence of B„ „i 
Boo.m as n ^ c» has not been proved yet. 

Let stand for total variation convergence. Our analysis 
of proportionate selection under the total variation distance 
yields the following results. 

Theorem 13. For c.i.i.d. operators, if A„ Aoo, 

then B„ „i —> Boo.m uniformly with respect to m, i.e. 

sup Ptv , Boo.m) ^ 0 ^ ^ tX). 

m 

Theorem 14. For the proportionate selection operator, 
71/(Boo.m) —’’■/(Boo.oo) as m ^ OO for all I G N+. 

Theorem 15. Boo.m ^ Boo.oo if cind only 
77/(Boo.m) ^ 7r/(Boo,oo) Uniformly with respect to I, 
i.e. sup/9tv(7r/(Boo.m),7r/(Boo.oo)) 0 as m ^ oo. 

i 

Comparing with Theorem |8l Theorem fT3] proves condition 
[U requiring the column-wise uniform convergence in Fig. [3 
Theorem[T4]proves convergence of finite-dimensional distribu¬ 
tions of the last row in Fig. [3] However, Theorem[T5] states that 
condition |2] in Theorem[8]requires the uniform convergence of 
finite-dimensional distributions of the last row. We have not 
proven this convergence yet. 

In summary, our results show that proving the convergence 
of Boo.m Boo.oo is more difficult under the total variation 
metric than under the Prokhorov metric, while in proving the 
uniform convergence of B„ „ Boo.m. it is the other way 


around. 

We think further analysis on proportionate selection can be 
conducted in the following two directions. 

1) In the analyses we tried to prove the stacking property 
on Ui for the IPM of proportionate selection. Apart from 
more efforts trying to prove/disprove this property, it is 
worth considering modifying the space Ui. For example, 
we can incorporate the rate of convergence into the 
space. If we can prove the stacking property on Ui n U 
where U is the space of converging sequences with rate 
Oihin)), it is also a meaningful result. 

2) Another strategy is to bypass the sufficient conditions 

and return to Definition |5] to prove Q“ for every 

k. This is the original method. In essence, it requires 
studying the convergence of nesting integrals. 

Apart from proportionate selection, it is also worth studying 
whether other operators, such as ranking selection, can be 
analyzed in our framework. As many of these operators do 
not generate c.i.i.d. offspring, it makes deriving the IPM and 
proving its convergence difficult, if not impossible. In this 
regard, we believe new techniques of modeling and extensions 
of the framework are fruitful directions for further research. 

Finally, it is possible to extend the concept of “incidence 
vectors” proposed by Vose to the continuous search space. 
After all, as noted by Vose himself, incidence vectors can also 
be viewed as marginal p.d.f.s of individuals. As a consequence, 
the cases of EAs on discrete and continuous solution spaces 
indeed do bear some resemblance. By an easy extension, the 
incidence vectors in the continuous space can be defined as 
functions with the form '^Ci5{xi), where 5 is the Dirac 
function and c/ is the rational number representing the fraction 
that Xi appears in the population. If similar analyses based on 
this extension can be carried out, many results in a-ci can 
be extended to the continuous space. 
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